Shared-Memory Parallelization of MTTKRP for Dense

نویسندگان

  • Koby Hayashi
  • Grey Ballard
  • Jeffrey Jiang
  • Michael Tobia
چکیده

Œe matricized-tensor times Khatri-Rao product (MTTKRP) is the computational boŠleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. Œe algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We benchmark sequential and parallel performance of our implementations, demonstrating high sequential performance and ecient parallel scaling. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to 7.4× over existing parallel so‰ware. ACM Reference format: Koby Hayashi, Grey Ballard, Yujie Jiang, Michael J. Tobia. 2018. SharedMemory Parallelization of MTTKRP for Dense . In Proceedings of , , , 12 pages. DOI: 10.1145/nnnnnnn.nnnnnnn

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Cholesky Factorization on Clusters of SMPs

Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...

متن کامل

Recursion based parallelization of exact dense linear algebra routines for Gaussian elimination

We present block algorithms and their implementation for the parallelization of sub-cubic Gaussian elimination on shared memory architectures. Contrarily to the classical cubic algorithms in parallel numerical linear algebra, we focus here on recursive algorithms and coarse grain parallelization. Indeed, sub-cubic matrix arithmetic can only be achieved through recursive algorithms making coarse...

متن کامل

cient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures ?

This paper presents a new parallelization method for an efcient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized witho...

متن کامل

Computer Science Technical Report Canonic Multi-Projection: Memory Allocation for Distributed Memory Parallelization

The Polyhedral model is now the accepted technology for automatic parallelization of affine control loop programs. It has been successful in automatically generating tiled shared memory parallel programs for shared memory platforms (plus vectorization). We address the challenges arising when we move toward distributed memory parallelization, based on wavefront execution of parameterized tiles. ...

متن کامل

Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures

This paper presents a new parallelization method for an ef-cient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017